Introduction
In this project, we analyzed Distributed Acoustic Sensing (DAS) data
captured as a 2D numpy matrix, representing strain rates along a fiber
optic cable installed on Jana Pawła II street. The data reflects the
vibrations caused by passing trams, trucks, and cars. Using image
processing and signal analysis techniques, we identified vehicle,
tracked their movement, and estimated their velocities, providing
valuable insights into traffic patterns on this busy street.
The task is very difficult, therefore we tested 2 approches. None of
these is the best, however, some scenarios are better in one case and
some in other.
Preprocessing
preprocessing_flow_chart
example_filename = "090332.npy"
img = load_from_file(example_filename).to_numpy()
To reduce noise and magnify the data we tried many methods in various
combinations. The most fulfilling approach consists of the following
steps:
- Absolute value - The values can be negative, but we
are interested only in the magnitude of the signal, so we can take the
absolute value
img = np.abs(img)
- Clip to the 99% percentile of intensity
distribution - As we can see from the plot, data contains
outliers, while most of the data lies in a one region
plt.title("Data distribution before removing outliers")
plt.hist(img.flatten(), bins=100)
plt.show()

high = np.percentile(img, 99)
img = np.minimum(img, high)
plt.title("Data distribution after removing outliers")
plt.hist(img.flatten(), bins=100)
plt.show()

- Standardize the data in range 0-255 - To normalize
the data for consistent comparison and processing.
img = np.around(255 * (img - np.min(img)) / (np.max(img) - np.min(img))).astype(
np.uint8
)
plot_numpy(img, title="Standarized data")

- Non-local Means Denoising algorithm - We implement
fastNlMeansDenoising with agressive parameters to reduce noise
img = cv2.fastNlMeansDenoising(img, templateWindowSize=7, searchWindowSize=21, h=14)
plot_numpy(img, title="Non-local Means Denoising")

- blur - We blur the image with a kernel with the
greater vertical size, as the data is very narrow in the horizontal
dimension.
img = cv2.blur(img, (3, 41))
plot_numpy(img, title="Blurred")

- Erosion - morphological operation - As various
signals tend to be close to each other, and we don’t need much of a
aconnectivity as well as edges details, we perform an erosion
operation.
img = cv2.morphologyEx(
img,
cv2.MORPH_ERODE,
cv2.getStructuringElement(cv2.MORPH_RECT, (3, 9)),
iterations=1,
)
plot_numpy(img, title="Erosion")

- Binary conversion - Now we can perform thresholding
segmentation to get a binary image.
_, img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
plot_numpy(img, title="Binary conversion")

Main idea
The main idea behind the two approaches is to detect lines in the
data, assuming contant speed of the vehicles
Then, the speed can be calculated from the formula:
def velocity_from_slope(slope: float) -> float:
return round(abs(1 / slope) * DX / DT, 2)
The slope is inverted, because we use the x-axis as the spatial
dimension and the y-axis as the time dimension. The velocity is
calculated in meters per second.
FIRST APPROACH
The idea of our first approach was to cluster the values from the
different data sources. Then we would apply linear regression within the
clusters, on the (x,y) cooridinates of non-zero values. The slope of the
regression line would give us the velocity of the vehicle.
method_flow_chart
1. Clustering
Because of the nature of the data and the fact that we want to
cluster datapoints by density rather than closeness (as in for example
KMeans) we chose the DBSCAN algorithm. It does
not require specifying the number of clusters in advance, making it a
versatile choice for exploratory data analysis.
DBSCAN (Density-Based Spatial Clustering of Applications with
Noise) is a clustering algorithm that groups together points that are
closely packed together, marking as outliers points that lie alone in
low-density regions. It requires two parameters: eps
(the maximum distance between two points to be considered neighbors) and
min_samples (the minimum number of points required to
form a dense region). DBSCAN is particularly effective for
discovering clusters of arbitrary shape and handling noise in the
data.
X = np.nonzero(img)
X = np.vstack(X).T
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
clustering = DBSCAN(eps=0.08, min_samples=300).fit(X_scaled)
no_of_clusters = int(np.max(clustering.labels_) + 1)
The algorithm detected python no_of_clusters clusters in
this exemplary data.
colors = generate_colors(no_of_clusters)
colored_clusters = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
for cluster_id in range(no_of_clusters):
# Indices of the points in the cluster
idx = X[clustering.labels_ == cluster_id, :]
colored_clusters[idx.T[0], idx.T[1], :] = colors[cluster_id]
plot_numpy(colored_clusters, title="Clustered data")

2. Linear Regression
Once we have separated clusters, we can apply linear
regression to each of them to estimate the velocity of the moving
object.
As some clusters are separated noise, we discard these with R2 <
0.3.
lines = []
velocities = []
for cluster_id in range(no_of_clusters):
idx = X[clustering.labels_ == cluster_id, :]
x_coords = idx[:, 1].reshape(-1, 1)
y_coords = idx[:, 0]
model = LinearRegression()
model.fit(x_coords, y_coords)
slope = model.coef_[0]
intercept = model.intercept_
x_start = np.min(x_coords)
x_end = np.max(x_coords)
r2 = model.score(x_coords, y_coords)
print(f"R2: {r2}")
if r2 < 0.3:
print(f"Discarding object with R2 < 0.3 ({r2})")
continue
velocity = velocity_from_slope(slope)
print(f"Detected velocity: {velocity} m/s ({''})")
velocities.append(velocity)
lines.append((slope, intercept, x_start, x_end))
## LinearRegression()
## R2: 0.92240659229734
## Detected velocity: 27.13 m/s ()
## LinearRegression()
## R2: 0.8707396109109362
## Detected velocity: 18.26 m/s ()
## LinearRegression()
## R2: 0.0029409018431784117
## Discarding object with R2 < 0.3 (0.0029409018431784117)
plot_numpy_with_lines(img, lines, title="Detected velocities")

SECOND APPROACH
In the second method, we approached the issue from a different angle.
First, we generated lines using Houghline and then, we clusterd
them.
method_2_flow_chart
1. Hough Lines
The Hough Line Transform is a feature extraction
technique used in image analysis and computer vision to detect straight
lines in an image. It works by transforming points in the image space to
the parameter space, where each point in the image space corresponds to
a sinusoidal curve in the parameter space. Lines in the image space are
identified by finding intersections of these curves in the parameter
space. This method is robust to noise and can detect lines even if they
are broken or partially obscured. It is commonly used in applications
like lane detection in autonomous driving and shape analysis.
To cluster lines we used the same algorithm like in the first method
- DBSCAN.
Implementation
Because our data is very narrow (python w) and the lines
are visible on the streched image, it makes sense to resize the date, so
the HoughLines algorithm has more freedom in term of slope
selection
aspect_ratio = 12 / 16
h, w = img.shape
new_w = int(aspect_ratio * h)
img = cv2.resize(img, (new_w, h), interpolation=cv2.INTER_NEAREST)
plot_numpy(img)

Now we can apply HoughLines algoritm, which will detect many lines
going through the dense regions
img_before = img.copy()
lines = cv2.HoughLinesP(
img, # Input edge image
1, # Distance resolution in pixels
np.pi / 180, # Angle resolution in radians
threshold=500, # Min number of votes for valid line
minLineLength=1500, # Min allowed length of line
maxLineGap=400, # Max allowed gap between line for joining them
)
Because some of the lines may be perfectly vertical, and we are
interested in the slopes which for them is an infinity we erase them
valid_lines = lines[~(lines[:, :, 0] == lines[:, :, 2]), :]
Now we can plot the lines
img_lines = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
color_palette = generate_colors(len(valid_lines))
for i in range(len(valid_lines)):
_ = cv2.line(
img_lines,
(valid_lines[i, 0], valid_lines[i, 1]),
(valid_lines[i, 2], valid_lines[i, 3]),
[int(c) for c in color_palette[i]],
1,
cv2.LINE_AA,
)
plot_numpy(img_lines)

2. Clustering
We now have to separate the line clusters and aggregate them.
We tried using (slope, intercept) and (center_x, center_y)
combinations
valid_lines_directional = np.apply_along_axis(
lambda x: np.array([*get_slope_and_intercept(*x)]), axis=1, arr=valid_lines
)
valid_lines_centers = np.apply_along_axis(
lambda x: np.array([(x[0] + x[2]) / 2, (x[1] + x[3]) / 2]),
axis=1,
arr=valid_lines,
)
Normalizing the data
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(valid_lines_centers)
clustering = DBSCAN(eps=0.15, min_samples=16).fit(X_scaled)
no_of_clusters = np.max(clustering.labels_) + 1
The DBSCAN algorithm has in-built outliers detection, so
we ommit lines marked as cluster -1
(python sklearn implementation)
img_clusters = cv2.cvtColor(img_before, cv2.COLOR_GRAY2RGB)
color_palette = generate_colors(no_of_clusters)
line_colors = np.apply_along_axis(lambda x: color_palette[x], 0, clustering.labels_)
line_colors[ clustering.labels_ == -1 ] = [0,0,255]
for i in range(len(valid_lines)):
_ = cv2.line(
img_clusters,
(valid_lines[i, 0], valid_lines[i, 1]),
(valid_lines[i, 2], valid_lines[i, 3]),
[int(c) for c in line_colors[i]],
1,
cv2.LINE_AA,
)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=clustering.labels_)
## <matplotlib.collections.PathCollection object at 0x767381120e90>
plt.show()

plot_numpy(img_clusters)

average_lines = []
velocities = []
for cluster_id in range(no_of_clusters):
average_line = np.mean(valid_lines_directional[clustering.labels_ == cluster_id], axis=0)
x_min = 0
x_max = new_w
average_lines.append([*average_line, x_min, x_max])
slope = average_line[0]
velocities.append(velocity_from_slope(slope))
img_average_lines = cv2.cvtColor(img_before, cv2.COLOR_GRAY2RGB)
plot_numpy_with_lines(img_average_lines, average_lines)
